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A three-round algorithm is presented that guarantees agreement in a system of K > 3F +1 
nodes provided each faulty node induces no more than F faults and each good node 
experiences no more than F faults, where, F is the maximum number of simultaneous faults 
in the network. The algorithm is based on the Oral Message algorithm of Lamport, Shostak, 
and Pease and is scalable with respect to the number of nodes in the system and applies 
equally to traditional node-fault model as well as the link-fault model. We also present a 
mechanical verification of the algorithm focusing on verifying the correctness of a bounded 
model of the algorithm as well as confirming claims of determinism. 
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Nomenclature 


D = minimum communication event-response delay 
d = network communication imprecision 

y = D+d 

Sync = communication message 

F = maximum number of faults in the network 

K = number of nodes in the network 

i = 1K 

Ni = ith Node 

Si = number of faults associated with N; 


I. Introduction 


istributed systems have become an integral part of safety-critical computing applications, necessitating system 

designs that incorporate complex fault-tolerant resource management functions to provide globally coordinated 
operations with ultra-reliability. As a result, robust clock synchronization has become a required fundamental 
component of fault-tolerant safety-critical distributed systems. Synchronization has practical significance as a 
fundamental service for higher-level algorithms that solve other problems. For example, in safety-critical TDMA 
(Time Division Multiple Access) architectures!**, synchronization is the most crucial element of these systems. 
Typically, the assumed topology is a regular graph such as a fully connected graph or a ring since they provide a 
base case to solve the distributed synchronization problem. 

A fundamental property of a robust distributed system is the capability of tolerating and potentially recovering 
from failures (loss of service due to a fault) that are not predictable in advance. A fault is a defect or flaw in a 
system component resulting in an incorrect state”*. The requirement to handle faults adds a new dimension to the 
complexity of the synchronizing distributed systems. In the context of fault-tolerant distributed systems, a fault 
presenting different symptoms to different observers is known as a Byzantine (arbitrary) fault. We assume that there 
are a maximum of F simultaneous faults in the network. 

We call an approach to solving the clock synchronization problem direct if it relies solely on local (node level) 
detection and filtering of faults. This approach is primarily limited to detecting timing and/or value faults of a 
node’s incoming messages. In contrast, we call an approach indirect if it relies on the network level detection and 
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filtering of faults independent of, and in addition to, the local detection and filtering of the faults. This approach 
however requires coordination at the network level. 

Thus far, there is no verifiable solution for the general case of the clock synchronization problem, where the 
topology is arbitrary and any number of various types of faults are tolerated. Furthermore, most attempts have been 
in trying to solve this problem directly, although there are some approaches to solve this problem indirectly using 
authenticated (signed) messages®. Driscoll®, Hall, Sivencrona, and Zumsteg however argue that “while the 
arguments of unforgeable signed messages make sense in the context of communicating generals, the validity of 
necessary assumptions in a digital processing environment is not supportable. In fact, the philosophical approach of 
utilizing cryptography to address the problem within the real world of digital electronics makes little sense. The 
assumptions required to support the validity of unbreakable signatures are equally applicable to simpler approaches 
(such as appending a simple source ID or a CRC to the end of a message). It is not possible to prove such 
assumptions analytically for systems with failure probability requirements near 10°°/hr.” Furthermore, addressing 
network element imperfections, such as oscillator drift with respect to real time and differences in the lengths of the 
physical communication media, is necessary to make a solution applicable to realizable systems. 

The main issue in solving the clock synchronization problem is a lack of symmetric view (agreement) in the 
system among the participating good nodes in the sense that two good nodes may disagree on the message sent. 
However, there are a number of ways of achieving message symmetry across the system. In Ref. 5 and 7 various 
ideas for overcoming failures in a robust distributed system are addressed that include tolerating Byzantine faults. 
In solving the consensus problem, which is the ability of a set of nodes to agree on a single value despite failures, 
Schmid, Weiss, and Keidar argue in Ref. 8 that: “A fully-fledged n-process consensus algorithm is obtained by 
using a separate instance of a Byzantine agreement algorithm (with n-1 receivers) for disseminating any process’s 
local value, and using a suitable choice function (majority) for the consensus result’.” The consensus problem, and 
hence, the proposed idea by Schmid, is based on an inherent assumption of synchrony among the good nodes, and so 
is not applicable to solving the clock synchronization problem. 

Other methods include using variety of engineering practices, e.g., using a self-checking pair at the node leve 
or central guardians at the system level'!:!*. However, as Driscoll, Hall, Sivencrona, and Zumsteg reported in Ref. 6, 
correctness of claims of these approaches may not be verifiable. Furthermore, we believe that to be generally useful, 
algorithms that guarantee agreement must be able to handle non-authenticated messages. Thus, the crux of our idea, 
as proposed in Ref. 13, is to solve this problem indirectly by first converting any message to a symmetric message, 
and then use a verified protocol based on the symmetry assumption to solve the synchronization problem. 

The Oral Message (OM) algorithm of Lamport>, Shostak, and Pease that solves the Byzantine Agreement (BA) 
problem" is also an indirect approach, and is meant to reliably transform a message from a single source to a 
symmetric message (an agreement) for a fully connected network. The OM algorithm has been proven to reach 
agreement at the network level for a given source*'*+!> and does not require initial synchrony among the good nodes. 
The OM requires F+/ rounds of exchanges and, with a message complexity of O(K"), the number of exchanged 
messages grows exponentially as F grows linearly. Therefore, the use of the OM algorithm for F > 2 is very costly 
and impractical. 

In this paper, we present an alternative for achieving agreement, hereafter referred to as 3ROM (3 Rounds using 
OM) algorithm that is based on the OM algorithm. The 3ROM guarantees agreement in a system of K > 3F+/ 
nodes, where F is the maximum number of simultaneous faults in the network. The 3ROM assumes each node Ni, 
i= 1..K, either induces up to F faults if it is a faulty node, or experiences no more than F faults if it is a good node. 
The 3ROM algorithm is independent of the fault model (node-fault or link-fault model), and as the name implies, 
achieves agreement in three rounds. Thus, it is independent of the number of faults (in terms of number of required 
rounds, not the amount of messages). The algorithm has a message complexity of O(K%), and is also scalable with 
respect to K. We also present the model checking results of a bounded model of the algorithm to verify its 
correctness. 

This paper is organized as follows. We describe the fault models in Section II. In Section III we provide a 
system overview. We present the 3ROM algorithm and its formal proof in Section HIV. In Section V, we present 
the model checking efforts toward verification of correctness of a bounded model of the algorithm and the results of 
that effort. Finally, we present concluding remarks in Section VI. 
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II. Fault Models 


In synchronous message-based distributed systems, a fault is typically defined as a message that was not 
transmitted when it was expected or a message that was transmitted but not received or received but not accepted, 
i.e., deemed invalid by a receiver. Thus, the fault is either associated with the source node of the message, the 
corresponding link between the source node and the destination node, or the destination node. Consequently, there 
are two viewpoints, node-centric and link-centric, and thus, there are two ways of modeling faults. In the node- 
centric model, we refer to as the node-fault model, the faults are associated with the source node of the message and 
all fault manifestations between the source and the destination nodes for the messages from that source count as a 
single fault, which is specially the case when the faults are associated with a Byzantine faulty node**!*!”. In this 
model all links are assumed to be good. Miner'®, Geser, Pike, and Maddalon, for instance, model the absence of a 
link as a link fault and even though both nodes and links failures are considered, they abstractly model link failures 
as failures of the source node. 

In the link-centric model, we refer to as the link-fault model, a fault is associated with the communication means 
connecting the source node to the destination node. In this model, all nodes are assumed to be good and an invalid 
message at the receiving node is counted as a single fault for the corresponding input link. Thus, from the global 
perspective, a Byzantine faulty node manifests as multiple link failures. 

A link-fault model introduced by Schmid!®, Weiss, and Rushby is called perception-based hybrid fault model, 
where faults are viewed from the perspective of the receiving nodes. Faults are associated with their input links, and 
all nodes are assumed to be good. They argued that since F faulty nodes can produce at most F faulty perceptions in 
any node, the link-fault model is compatible with traditional node-fault model and so, all existing lower bound and 
impossibility results remain valid. 

“In the perception-based model, the system-wide number of faults is replaced by the number of faults that are 
observable in the nodes’ local “perceptions” of the system. Formally, node r’s perception vector V, = (V,/, Vi7,...5 


V.*,) is considered, where every perception V,° € V, represents the message node r received from node s in some 
specific round; type and value(s) depend upon the particular algorithm considered!*’. In that paper, Schmid!’, 
Weiss, and Rushby present a solution for synchronous deterministic consensus problem, where all nodes are 
expected to achieve agreement on a single value, in synchronous distributed systems with link faults. 


III. System Overview 


We consider “synchronous” message-passing distributed systems and model the system as a graph with a set of 
nodes (vertices) that represent the pulse-coupled entities and a set of communication links (edges) that represent 
their interconnectivity. Same as the OM algorithm, the underlying topology considered is a fully connected network 
of K nodes that exchange messages through a set of communication links. The system consists of a set of good 
nodes and a set of faulty nodes. A good node is assumed to be an active participant and correctly execute the 
algorithms. A faulty node is either benign (detectably bad), symmetrically faulty, or arbitrarily (Byzantine) faulty. 
However, in this paper our primary focus is Byzantine faults. 

The communication links are point-to-point and unidirectional, each connecting a source to a destination node. 
Thus, the fully connected graph consists of K(K-1) unidirectional links. A good link is assumed to correctly deliver 
a message from its source node to its destination node within a bounded communication delay time. A faulty link 
does not deliver the message, delivers a corrupted message, or delivers a message outside the expected 
communication delay time. 

The nodes communicate with each other by exchanging broadcast messages. Broadcast of a message by a node 
is realized by transmitting the message, at the same time, to all nodes that are directly connected to it. The 
communication network does not guarantee any relative order of arrival of a broadcast message at the receiving 
nodes, that is, a consistent delivery order of a set of messages does not necessarily reflect the temporal or causal 
order of the message transmissions!. A maximum of F faults are assumed to be present in the system, where F > 0. 
We assume K > 3F+1 and define the minimum number of good nodes in the system, G, by G = K-F nodes. The 
minimum number of nodes needed®!*” to maintain synchrony is well established to be 3F+1. 


A. Communication Delay 

The communication delay between directly connected (adjacent) nodes is expressed in terms of the minimum 
event-response delay, D, and network imprecision, d. These parameter are measured at the network level. A 
message broadcast by node at real time ¢ is expected to arrive at its directly connected adjacent nodes, be processed, 
and subsequent messages to be generated by those nodes within the time interval [t+D, tt+D+d]. Communication 
between independently-clocked nodes is inherently imprecise. The network imprecision, d, is the maximum time 
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difference among all receivers of a message from a transmitting node with respect to real time. The imprecision is 
due to many factors including, but not limited to, the drift of the oscillators with respect to real time, jitter, 
discretization error, temperature effects and differences in the lengths of the physical communication media. These 
parameters are assumed to be bounded, D > 0, d = 0, and both have units of real-time clock ticks and their values 
known in the network. The communication delay, denoted y, is expressed in terms of D and d, is defined as 
y = D+d, and has units of real-time clock ticks. In other words, we assume synchronous communication and bound 
the communication delay between any two directly connected adjacent nodes by [D, 7]. However, for simplicity of 
notation, in the remainder of this paper we assume that the messages arrive logically at the same time at the 
destination nodes. 


B. The Sync Message And Its Validity 

In order to achieve and maintain desired synchrony, the nodes communicate by exchanging Sync messages, 
where synchrony is defined as a measure of the relative imprecision of the good nodes. Assuming physical-layer 
error detection is dealt with separately, the reception of a Sync message is indicative of its validity in the value 
domain. A Sync message from a given source is valid if it arrives at or after one D of an immediately preceding 
Sync message from that source, that is, the message validity in the value domain, i.e., valid Sync messages are rate- 
constrained. Again, assuming physical-layer error detection is dealt with separately, the reception of a Sync message 
is indicative of its validity in the value and time domains. 


IV. 3ROM 


In a synchronous distributed system, the Oral Message algorithm of Lamport°, Shostak, and Pease solves the 
Byzantine Agreement (BA) problem! by reliably transforming a message, in the presence of faults, to a symmetric 
message at the network level, whereby the good nodes reach an agreement and collectively either accept or reject the 
message. The OM algorithm is recursive and every iteration of the execution of the algorithm constitutes a step 
(round) of exchange of messages by the nodes. An instance of the OM algorithm starts with the source node 
broadcasting a message, the first round, followed by other nodes (except the source of the message) recursively 
rebroadcasting (relaying) the messages they receive to others in subsequent rounds. For a fully connected graph of 
K nodes, the OM algorithm requires F+/ communication rounds. At the end of the /+/ rounds, the nodes vote and 
reach agreement. 

In this section we present a three-round algorithm, similar to the OM algorithm, that achieves agreement among 
the good nodes, within a time bound, independent of F, provided that K >3F+/ nodes, where K is the number 
of nodes, and f; < F for i = 1..K, where fj is the number of faults associated with N;. For the node-fault model, the 
faulty nodes do whatever they want as long as their behavior is bounded by the assumption, i.e., at any round a 
faulty node broadcasts a valid message to at least K-F other nodes. Similarly, for the link-fault model, at any round 
no more than F link faults are perceived by a receiving good node. We will also show that this three-round 
algorithm applies equally to the node-fault and link-fault models. To simplify presentation of this algorithm, we 
assume broadcast messages arrive logically at the same time at their destination good nodes. We later remove this 
simplifying assumption and show that agreement is reached at the good nodes within a time bound. We use two 
types of messages: Sync and Relay, and extend the message validity argument of Section III.B to both messages. 


The 3ROM Algorithm 


The algorithm depends on two positive parameters a and /, which are used in determining the final 
acceptance or rejection of a Sync message from a source node. Hence, the algorithm described is actually 
3ROM(a, f). The algorithm consists of three rounds and a vote 


Round I — The source node broadcasts a Sync message to all other nodes, effectively saying “J’m here.” A 
node does not send a message to itself even though it uses its own message. The nodes that receive the 
message record that the message was received. 


Round 2 — All nodes (including the source node) that received a Sync message in Round J broadcast a Relay 
message to all other nodes, essentially saying “/’ve got a message.” The good nodes that do not receive 
the Sync message do nothing. 


Round 3 — All nodes broadcast a vector of K messages containing what they have received from all other 


nodes in Rounds J and 2; effectively saying “This is what I’ve received from others.” 
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Vote — At the end of Round 3, each node locally constructs a KxK network-level matrix M of received 
messages, where entries M(i, j) = {s, r, 0}, i, j = 1..K, where ‘s’ indicates having received a Sync message, 
‘ry’ indicates having received a Relay message and ‘0’ for not receiving any messages, i.e., a fault. A 
column c; of the matrix M corresponds to the messages transmitted by N; and a row r; of the matrix M 
corresponds to the messages received by N;. Let, for i, j = J..K, Xi= 1 if Yc; > a and X; = 0, otherwise, 
where )'c; is the sum of the non-zero entries in column c;, i.e., treating ‘s’ and ‘r’ entries equally. Finally, 
the node votes “accept” if )\X; > f, i.e., the node accepts the message from the source node if more than £ 
columns of M have more than @ non-zero entries each. 


In addition to the network-level matrix M that each node constructs at the end of Round 3, the proofs to follow 
rely on another related matrix Meiopai that can be constructed at the end of Round 2. Following Round 2, each node 
has a vector of received messages that describe the messages it received from each node. The Meiosai with entries 
related to the matrix M, i.e., Meiopai(i, j) = {s, r, O}, reflects a global view of the network at the end of Round 2. 
While it is inaccessible to any nodes, it is related to the matrices M built by each node. 

We need to point out that in the 3ROM algorithm, in rounds 2 and 3, if a node does not receive a message, it does 
not broadcast anything. This is in contrast to the OM algorithm where, after the first round, a node that did not 
receive a message is required to transmit a default message to others. 


A. Proof Of The 3ROM Algorithm For Link-Fault Model 
The proof of correctness of the OM algorithm, and consequently, the proof of correctness of the 3ROM 
algorithm, is centered on the following two properties. 
AP (Agreement Property): If receiver nodes p and q are nonfaulty, then they agree on the value ascribed to the 
transmitter. 
VP (Validity Property): If the transmitter is nonfaulty, then every nonfaulty receiver computes the correct 
value. 


Theorem 1. For a fully connected graph with K > 3F nodes and the link-fault model, i.e., fi < F, 3ROM(K/3, 2K/3) 
guarantees agreement at the good nodes. 


Proof. The source node is a good node (all nodes are good in the link-fault model). 


Round 1 — Ns, the source node, broadcasts a Sync message and it is received correctly (valid) by at least K-F nodes. 
Let H be this set of nodes. 


Round 2 — Each node in H broadcasts a Relay message to all other nodes and at least K-F other nodes will receive 
the Relay message correctly. Hence, for each node in H, its corresponding column in Megicpai has at least 
K-F non-zero entries, i.e., ‘s’ and ‘r’. 


Round 3 — Each node broadcasts its vector of received messages to all other nodes and at least K-F nodes will 
receive it correctly. At the end of this round, a node constructs its matrix M, which is similar to, but 
likely different from, Meiopai. Since the rows of M are the messages in Round 3, at most F rows can be 
different from M¢giopa. Thus, the sum of non-zero entries in any column in Mgiopai is reduced by at most F. 
Thus, the columns in M corresponding to nodes in H have at least K-F-F non-zero entries. Since K > 3F, 
(equivalently, K/3 > F), there are at least K-F > 2K/3 columns in M (the nodes in HA), with at least 
K-2F > K/3 non-zero entries. Thus, each node votes “accept” for 3ROM(K7/3, 2K73). 


Table 1 is an example of the network-level matrix at the end of Table 1. An example of matrix of received 
Round 3 for F = 2 and K=7. The grayed cells along the diagonal messages at the end of Round 3. 


in this matrix are the messages a node sends to itself, thus, cannot Ni | 1 21/3)4)5]6]7 
be faulty. An “s r” entry indicates that a Sync message was 1 |erlololririlolo 
received in Round 2 and was replaced by a Relay message from 2 |sr|r}|O|O|r}|o|o 
the same node in Round 3. 3 |sri|rir{]O|0|0|0 

4 |sr|O|}r|r{|0O {00 
Table 2 shows the matrices at N; and N2, as examples, at the end jis ri{r|j{rjir|o|o 
of Round 3. The grayed rows indicate the effects of link faults at 6 |9r|r|rir{ir Meo 
the two nodes in Round 3. 7 |0 |riririr|o 

Xi | 1 Z|/1{/1})1/;0}0 
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Table 2. An example of matrix of received messages at N; and N2 at the end of Round 3. 


Ni | 1 2|3|4|5)|)6|7 N2 | 1 2|3|4|5)|)6|7 
1 sr|O0|O|r|r{|0O {oO 1 0 01';0|10|0|0)|0 
2 sr{|r|O/}0O|}r|0/0 2 sr|r|0O|0O/}r|010 
3 sr{r{|r/0O0|;0/|0/0 3 sri|r|{r}|0/0|010 
4 sr|O}rir|0O!|0/0 4 sr|}O|r}jr{]0|010 
5 S r|rir{|r{|0O lo D Ss rir{rj{r{0O|o 
6 0 0|}|0\;0|0|0)0 6 Or|r{r{i{rir{]oOo}o 
iE 0 01'0!10/0|0)|0 if 0 01'0!10|0|0|0 
Xi | 1 Z)}i1)]21)]1)0)]0) Vel Xi | 1 IZ} 1)i1)1)0) 0} V=l 
B. Proof Of The 3ROM Algorithm For Node-Fault Model 


In the classic node-fault model, a node’s message may be perceived as faulty by many other nodes. However, 
we assume there are up to F simultaneous Byzantine faulty nodes present in the network and they behave arbitrarily 
but are limited to inducing no more than F faults at each round, i.e., fi< F. 


Theorem 2. For a fully connected graph with K = 3F+1 nodes and the node-fault model, i.e., fi < F, 3ROM(K/3, 
2K/3) guarantees agreement at the good nodes. 


Proof. Note that a bounded-Byzantine faulty node can be modeled by a link-fault model, where the faulty links are 
exactly those where the received messages are invalid. Thus, whether the source node is good or Byzantine faulty, 
since the maximum number of bounded-Byzantine faulty nodes, F, is less than a third of K and since the link-fault 
model allows each node up to F faults per round, the proof of Theorem | applies in this case. 


Table 3 is an example of the matrix after Round 3 at N;. We Table 3. An example of matrix of received 
would like to point out that, unlike the Table 2 for the link-fault messages at the good nodes Ni, i = 1..5, at the 
model, this matrix is the same at all good nodes except for the end of Round 3. No and N7 are the Byzantine 
rows and columns corresponding to the faulty nodes; cs, c7, and rs, _ ‘faulty nodes and No is the source node. 

r7, respectively. 


Ni| 1 |2)3|4|5 | 6 7 
Theorem 3 (Agreement). For any F and K, for a_ fully I {riri|r|0|0{sr\o0 
connected graph with K > 3F+1 nodes and fi < F, the 3ROM Be FO) S| He PA Ee 
algorithm satisfies AP at the good nodes. aaa ae Ea Ee 

4 |rir{r}|O}0|@r|0O 
Proof. It follows from Theorems | and 2 that, if the assumptions 5 lrlrlrlololo if 
are met, the 3ROM algorithm always guarantees agreement at the 6 |lxlxlilxlxlxlerlr 
good nodes. 7 |x |x}xlx|xiter|r 

Xi |J|i1}i1{0;0;1 I|V=l1 


Theorem 4 (Validity). For any F and K, for a fully connected 
graph with K = 3F+1 nodes and f; < F, algorithm 3ROM satisfies 
AP and VP. 


Proof. It follows from Theorems 1 and 2 that, if the assumptions are met, the 3ROM algorithm satisfies the 
agreement and validity properties at the good nodes. 


Corollary 5. The number of rounds required by 3ROM algorithm is independent of F. 


Proof. It follows from Theorem 3 that the 3ROM algorithm always guarantees agreement at the good nodes in three 
rounds and regardless of a particular value of F. 


Theorem 6. For a fully connected graph with K = 3F+1 nodes and fj < F, the link-fault model supersedes the node- 
fault model. 


Proof. Given the assumptions that f; < F, i.e., a node either is faulty and induces up to F faults per round or it is 
good and experiences no more than F faults per round. Given the link-fault model, the 3ROM algorithm converts 
any message from a node N; to a symmetric message in three rounds. With the link-fault model, the nodes are 
considered to be good even though the faults are manifested on their links. Thus, given up to F faults per node (i.e., 
the maximum number of outgoing faulty links per node), a maximum of KF faults per round are tolerated. With the 
node-fault model, a maximum of F faulty nodes are assumed to be present with up to F faults per outgoing links of a 
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faulty node, thus, a total of F? faults per round are tolerated. Since for F > 0, F? < KF, the link-fault model 
supersedes the node-fault model. 


Thus far, we assumed that the Byzantine faulty nodes behave arbitrarily but are limited to inducing no more than 
F faults at any round. We now weaken the assumption of f; < F so that a faulty node behaves arbitrarily in Round 2 
and/or Round 3. The 3ROM algorithm still achieves agreement, but, the voting criteria needs to be adjusted to 
accommodate this weakened assumption, i.e., 3ROM(K/3, K/3+1). One manifestation of a faulty behavior is for the 
node to not broadcast anything during Round 2 and/or Round 3. Note that when a node fails crash-silent, f; = K and 
it can readily be detected from the network-level matrix at the end of Round 3 since its corresponding column will 
have at least K-F zeroes. This diagnosis information can potentially be used at the network level. We now show 
that the 3ROM algorithm still achieves agreement when the source is a Byzantine faulty node. 


Theorem 7. For a fully connected graph with K > 3F nodes and the node-fault model, i.e., fi < F, when the source 
node is a Byzantine faulty node, 3ROM(K/3, K/3+1) guarantees agreement at the good nodes. 


Proof. The source node is a Byzantine faulty node. 

Round 1 — The source node broadcasts a Sync message to at least K-F nodes where at least K-F*-F* are good nodes 
and receive the message correctly (valid). Let H be this set of good nodes. 

Round 2 — Each node in H broadcasts a Relay message to all other nodes and at least K-F other nodes will receive 
the Relay message correctly. Hence, for each node in H, its corresponding column in Meicpa has at least K-F 
non-zero entries. 

Round 3 — Each node broadcasts its vector of received messages to all other nodes and at least K-F nodes will 
receive it correctly. At the end of this round, a node constructs its matrix M, which is similar to, but likely 
different from, Mgiora. The matrix M at a good node N; is different from Meiosai in at most F rows and F 
columns corresponding to the Byzantine faulty nodes. Also, the matrix M at a good node J; is identical to the 
matrix M at other good nodes Nj, j=1..G and j#i, except in the same rows and columns corresponding to the 
Byzantine faulty nodes. Therefore, the columns in M corresponding to nodes in H have at least K-F non-zero 
entries. Furthermore, the column corresponding to the faulty source node, has at least K-2F non-zero (the nodes 
in H) entries in M. Since K > 3F, (equivalently, K/3 > F), there are at least K-2F+1 > K/3+1 columns in M 
(the nodes in H plus the faulty source), with at least K-2F > K/3 non-zero entries. Thus, each node votes 
“accept” for 3ROM(K73, K/3+1). 


Note that when the source is a good node (node-fault model), since the set H consists of at least 2K/3 good 
nodes, this weaker assumption holds and Theorems 2 and 7 apply. Also, although this weaker assumption does not 
apply to the link-fault model (all nodes are good) and Theorems | still holds, nevertheless, since with this weaker 
assumption, / = K/3+/ and K/3+/ < 2K/3, Theorem 7 applies to both models. 


C. Message Observation Window, Agreement Within A Time Bound 
Earlier in this paper we stated that to simplify the explanation of the pioelem and our proposed solution, a 


transmitted message from a single source P 7 i ys 
arrived at the receiving nodes logically at 
the same time. In this section we visit this | | | | 
assumption and justify this rationality. Inan “* b t—5 a ape 
implementation, as we have explained in 5 is si j. 
Section IIIA, a given message from a single [ | [ | 
source arrives at the receiving nodes within 

: . : nee: N, v v 
d units of each other. Figure | is a depiction F Bt ha ae 
of a message through three rounds of the s i sj s 
3ROM algorithm. Ns is the source node, N; 
and Nj represent the nodes that receive the y_ ue vy ¥ ¥ sine 
message at the two extremes of the 0 D 0 D 
communication latencies, i.e., D and D+d = } | | { 
y, respectively, i#j #S. Thus, unless proper PA ; : = P " 


measures are taken, consequent relaying of Figure 1. Message observation window = [-2d, d] from y. 
messages at subsequent rounds widens 


Up to F good nodes do not receive the message. 
8 Up to F simultaneous faulty nodes. 
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message arrivals at the nodes for every round by an additional d. In this figure, ‘ft’ indicates broadcasting a 
message, ‘|’ indicates receiving a message, and the labels on these arrows, s, i, and j, correspond to Ns, Ni, and Nj, 
respectively, as the initiators of the messages. 

For the following lemmas, Ns is the source node initiating Round 1, Nj; is the node that receives the message at 
the earliest time, i.e., D, and Nj; is the node that receives the message at the latest time, i.e, D+d = y. 


Lemma 8. Message observation window for Round 2 is [y-2d, y+d]. 


Proof. Round 2 begins by Nj and Nj relaying the messages they received from Ns. For this round, the nodes relay 
the messages as soon as they receive it, i.e., within at most d of each other. Thus, at the end of Round 2, the 
messages arrive at the nodes within 2d of each other. Since a node does not physically send a message to itself, but 
uses its own message, to account for the worst case message delivery time, its message is assumed to arrive at itself 
at y. At the end of Round 2, the earliest a message can arrive is at Nj and from the first node that started its Round 2, 
i.e., N;. As shown on the timeline of activities, this message arrives at the longest delay minus the accumulated drift 
for two rounds, i.e., y-2d. Similarly, the latest a message can arrive is at N; and from the node that started its Round 
2 last, ie., Nj, and at the longest delay plus the initial drift between the nodes from the previous round, ie., y+d. 
Thus, the window of observation for message arrival for Round 2 is [y-2d, y+d]. 


Lemma 9. All good nodes participating in Round 2 finish Round 2 and start Round 3 within d of each other. 


Proof. From Figure 1, from the start of Round I to the end of Round 2, the earliest message (EM) and latest 
message (LM) arrival time at Ns at EMs = y+D-d and LMs = y+y, respectively. Similarly, for Ni, EM; = D+d+D, 
LM; = D+d+y, and for Nj, EM; = y+D-d, LM; = y+y. Simple algebraic operations results in Agy = d and A;y = 0 for 
any two nodes, i.e., the nodes finish Round 2 and start Round 3 within d of each other. 


Lemma 10. Message observation window for Round 3 is [y-2d, y+d]. 


Proof. It follows from Lemma 9 that the nodes start Round 3 within d of each other. In a similar argument as 
Lemma 8 the window of observation for message arrival for Round 3 is [y-2d, y+d]. 


It follows from Lemmas 8 through 10 that at the end of each round the nodes are within d of each other, thus, 
justifying rationality of our assumption of logical timing of arrival of messages. 


D. Complexity Of The 3ROM Algorithm 

In the 3ROM algorithm, since a node does not send a message to itself, the number of transmitted messages per 
node and for each round is (K-1). For the worst case analysis, all nodes participate in Rounds 2 and 3. Thus, the 
total number of messages transmitted per round is (K-1), K(K-1), and K?(K-1), for Rounds I, 2 and 3, respectively. 
Therefore, the total number of exchanged messages is (K-1)+K(K-1)+K?(K-1) and the message complexity for the 
3ROM algorithm is O(K?). However, if a message is indeed physically broadcast to all, e.g., when the 
communication means is wireless, then the number of broadcast messages per node for each round is 1. Thus, the 
total number of messages broadcast per round is 1, (K-1), and K(K-1), for Rounds 1, 2 and 3, respectively. 
Therefore, the total number of exchanged messages is 1+(K-1)+K(K-1) and the message complexity for the 3ROM 
algorithm is O(K’). However, the message complexity for the OM algorithm for the above two scenarios is O(K*) 
and O(K*!), respectively. We would like to emphasize that the number of rounds of exchanged messages for the 
3ROM algorithm is independent of F. 


V. Model Checking 


In this section we present a mechanical verification of the 3ROM algorithm using the model checking approach 

for its ease, feasibility, and quick examination of the problem space, to verify correctness of our formal proof of the 
algorithm. The Symbolic Model Verifier (SMV) was used in the modeling of this algorithm on a PC with 4GB of 
memory running Linux”®. SMV’s language description and modeling capability provide relatively easy translation 
from the pseudo-code. SMV semantics are synchronous composition, where all assignments are executed in parallel 
and synchronously. Thus, a single step of the resulting model corresponds to a step in each of the components. 
A number of cases for each fault model were model checked. In particular, for the node-fault model, scenarios with 
F =0..3 and K = 4..10, respectively, were model checked with the weaker assumptions, that is, cj => F+J/ and 
Yi = F+2. Model checking of the link-fault model requires specific number of link faults being considered. Two 
cases with F = 2, K=7, and F = 3, K = 10, were model checked. Model checking of larger graphs and with more 
number of node and link faults can readily be accommodated. Due to space limitations we do not discuss the SMV 
models in detail. The models can be found online at http://shemesh.larc.nasa.gov/people/mrm/publications.htm. 
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A. Propositions 

Computational tree logic (CTL), a temporal logic, is used to express properties of a system. In CTL formulas are 
composed of path quantifiers, E and A, and temporal operators, X, F, G, and U?!. In this section the claims of 
agreement at the good nodes and at the end of the third round is examined. The node-fault and link-fault models are 
model checked separately for F = 1, 2, and 3, while the same CTL proposition is used to verify agreement has been 
reached at all good nodes for both models. 

For model checking of each scenario, a particular node is instructed to be the source and scheduled to initiate 
broadcast of a Sync message at a particular time. Since the 3ROM is deterministic, the final vote time, 
VotingResultTime, is set to the end of the 3" round after the broadcast of the initial Sync message. Validation of the 
CTL proposition requires examination of an underlying proposition. In particular, the variable VoteTime is used in 
these properties and is defined here. 

VoteTime = (GlobalClock = VotingResultTime) ; 


The GlobalClock is a measure of elapsed time from the beginning of the operation with respect to the real time, 
i.e., external view. The VoteTime is indicative of the GlobalClock reaching its target value of VotingResultTime and 
the GlobalAgreement is defined as the conjunction of voting results at all good nodes. 


Proposition SystemLiveness: AF (VoteTime) 

This property addresses the liveness property of the system and whether time advances and the amount of 
time elapsed, VoteTime, has advanced beyond the broadcast of the message and the three rounds to reach 
agreement on that message. 


Proposition GlobalAgreement: AF (VoteTime & GlobalAgreement) 

This proposition encompasses the criteria for the agreement property as well as the claim of determinism. 
The proposition specifies whether or not the system will reach agreement in the three rounds after the 
message was initially broadcast. This property is expected to hold. 


The model checking results of the bounded model of the algorithm have verified the correctness of the algorithm 
for fully connected networks with K > 3F + | nodes, for both node-fault and link-fault models, and for the following 
scenarios; F = 0, 1, 2, 3 simultaneous faults. In addition, the results have confirmed the claims of determinism and 
independence of the algorithm from F. 


VI. Conclusion 


Distributed systems have become an integral part of safety-critical computing applications, necessitating system 
designs that incorporate complex fault-tolerant resource management functions to provide globally coordinated 
operations with ultra-reliability. As a result, robust clock synchronization has become a required fundamental 
component of fault-tolerant safety-critical distributed systems. The main issue in solving the clock synchronization 
problem for the general case is a lack of symmetric view in the system at the participating good nodes. We first 
enumerated several ways of achieving message symmetry across the system, and then presented an alternative, 
referred to as 3ROM algorithm, that guarantees agreement in a system in three rounds provided that K > 3F+/ 
nodes, each faulty node induces no more than F faults and each good node experience no more than F faults. The 
algorithm is based on the Oral Message algorithm of Lamport*, Shostak, and Pease, is scalable with respect to the 
number of nodes in the system, and applies equally to traditional node-fault model as well as the link-fault models. 
The 3ROM is independent of the fault model (node-fault or link-fault model), and is independent of the number of 
faults (in terms of number of required rounds, not the amount of messages), and has a message complexity of O(K?). 
We also presented a mechanical verification of the algorithm for up to three simultaneous Byzantine faults. The 
model-checking effort was focused on verifying the correctness of a bounded model of the algorithm as well as 
confirming claims of determinism. The underlying topology in this paper was a fully connected graph and we leave 
the generalization of our solution to other topologies, including an arbitrary graph that meets the minimum 
requirements of number of nodes and connectivity, to future works. 
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